Domain-Independent Optimistic Initialization for Reinforcement Learning

نویسندگان

Marlos C. Machado

Sriram Srinivasan

Michael H. Bowling

چکیده

In Reinforcement Learning (RL), it is common to use optimistic initialization of value functions to encourage exploration. However, such an approach generally depends on the domain, viz., the scale of the rewards must be known, and the feature representation must have a constant norm. We present a simple approach that performs optimistic initialization with less dependence on the domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs - Extended Version

In this paper we propose an algorithm for polynomial-time reinforcement learning in factored Markov decision processes (FMDPs). The factored optimistic initial model (FOIM) algorithm, maintains an empirical model of the FMDP in a conventional way, and always follows a greedy policy with respect to its model. The only trick of the algorithm is that the model is initialized optimistically. We pro...

متن کامل

Exploiting locality of interactions using a policy-gradient approach in multiagent learning

In this paper, we propose a policy gradient reinforcement learning algorithm to address transition-independent Dec-POMDPs. This approach aims at implicitly exploiting the locality of interaction observed in many practical problems. Our algorithms can be described by an actor-critic architecture: the actor component combines natural gradient updates with a varying learning rate; the critic uses ...

متن کامل

Practicing Q-Learning

Q-Learning has gained increasing attention as a promising real time learning scheme from delayed reinforcement. Being compact, model free and theoretically optimal it is commonly preferred to AHC-Learning and its derivatives. However, it has long been noticed that theoretical optimality has to be sacrificed in order to meet the constraints of most applications. In this article we report of expe...

متن کامل

Coordination of independent learners in cooperative Markov games

In the framework of fully cooperative multi-agent systems, independent agents learning by reinforcement must overcome several difficulties as the coordination or the impact of exploration. The study of these issues allows first to synthesize the characteristics of existing reinforcement learning decentralized methods for independent learners in cooperative Markov games. Then, given the difficul...

متن کامل

Complexity Analysis of

This paper analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous realtime versions of Q-learning and value-iteration, applied to the problem of reaching a goal state in deterministic domains. Previous work had concluded that, in many cases, tabula rasa reinforcement learning was exponential for such problems, or was tractable only if the learning algorithm wa...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1410.4604 شماره

صفحات -

تاریخ انتشار 2015

Domain-Independent Optimistic Initialization for Reinforcement Learning

نویسندگان

چکیده

منابع مشابه

Optimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs - Extended Version

Exploiting locality of interactions using a policy-gradient approach in multiagent learning

Practicing Q-Learning

Coordination of independent learners in cooperative Markov games

Complexity Analysis of

عنوان ژورنال:

اشتراک گذاری